Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable pointsize #251

Closed
wants to merge 15 commits into from
Closed

Conversation

teunbrand
Copy link

@teunbrand teunbrand commented Jan 17, 2024

This PR aims o fix #83.

Briefly, it takes Paul's advice here to cast everything to an absolute unit as soon as possible, and then refuse to deal with other types of units ever again (unless absolutely needed). This should make the segments attach nicely to the normally sized circle point (shapes 1, 16, 19, 21, but not 20). This is also stable upon resizing the device window.

Reprex from #83 (comment):

devtools::load_all("~/packages/ggrepel")
#> ℹ Loading ggrepel
#> Loading required package: ggplot2

set.seed(42)
d <- data.frame(
  x = c(1, 2, 2, 3),
  y = c(1, 2, 3, 1),
  pointsize = c(0, 2, 1, 0),
  label = sprintf("label%s", 1:4)
)

p <- ggplot(d, aes(x, y)) +
  geom_text_repel(
    min.segment.length = 0.1,
    box.padding = 0.2,
    aes(label = label, point.size = pointsize),
    size = 5, max.iter = 1e5, max.time = 1
  ) +
  continuous_scale(
    aesthetics = c("size", "point.size"), 
    palette = scales::area_pal(c(2, 50)), guide = "none"
  )
  
p + geom_point(aes(size = pointsize), color = "grey50")

Shape = 20 has a smaller diameter, so the segment doesn't attach for that shape.

p + geom_point(aes(size = pointsize), color = "grey50", shape = 20)

Created on 2024-01-17 with reprex v2.1.0

I had to update 1 test that didn't draw any segments by default.
Visual tests needed updating due to tiny changes in location of some of the labels, usually something like 0.01 unit in a scale of 100s. I presume these are rounding errors and otherwise harmless.

Please have a go and play around a bit yourself as I hadn't considered every option under the sun for this.

# `.pt / .stroke` ~= 0.75 accounts for a historical error in ggplot2
point.size <- x$data$point.size
point.size[is.na(point.size)] <- 0
point.size <- point.size * .pt / .stroke / 20
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this here was the magic incantation you were looking for in #83 (comment)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing, I don't think I would have ever figured this out by myself. 🙏

You're a wizard 🧙‍♂️

@slowkow
Copy link
Owner

slowkow commented Jan 22, 2024

Some of the examples failed to build after these changes, so I had to do some fiddling with NA and converting to units.

After I fixed that, the forces seem to be a bit off now. Some examples still look good, but others have too little force so the labels are squished together or on top of each other.

Playing with units is harder than I thought. But I'll keep trying and update here when I feel like I have some good results, because I think your solution makes the segments the perfect length.

@teunbrand
Copy link
Author

Thanks for the heads up and trying out various options! Could you point me to a specific example that would be broken with this PR? I could try and have a go at preventing any serious damage.

@slowkow
Copy link
Owner

slowkow commented Jan 23, 2024

The variable point size example is perfect! it's incredibly satisfying to see the segment lengths perfectly calculated. 💯

But there are a few examples where my arbitrary choice of force magnitude is now a bit off after switching the unit of measurement.

Example 1

Here is one example that used to look good, but now it is squished:

set.seed(42)
ggplot(mtcars, aes(x = wt, y = 1, label = rownames(mtcars))) +
  geom_point(color = "red") +
  geom_text_repel(
    force_pull   = 0, # do not pull toward data points
    nudge_y      = 0.05,
    direction    = "x",
    angle        = 90,
    hjust        = 0,
    segment.size = 0.2,
    max.iter = 1e4, max.time = 1
  ) +
  xlim(1, 6) +
  ylim(1, 0.8) +
  theme(
    axis.line.y  = element_blank(),
    axis.ticks.y = element_blank(),
    axis.text.y  = element_blank(),
    axis.title.y = element_blank()
  )

Example 2

Here is another one where the result looks worse than it did before, maybe the number of iterations might be too low with the new units:

set.seed(42)
mtcars$label <- rownames(mtcars)
mtcars$label[mtcars$mpg < 25] <- ""
ggplot(mtcars, aes(x = wt, y = mpg, color = factor(cyl), label = label)) +
  coord_polar(theta = "x") +
  geom_point(size = 2) +
  scale_color_discrete(name = "cyl") +
  geom_text_repel(show.legend = FALSE) + # Don't display "a" in the legend.
  theme_bw(base_size = 18)

Possible approaches

I can think of two approaches to try to address this. The first should probably be trivial to implement (but I failed, so maybe not). The second is a lot more work but probably more robust. There are probably more approaches that I have not yet considered.

Approach 1

The first approach is to figure out what scaling factors to change. For example, I tried fiddling with these two numbers:

ggrepel/R/geom-text-repel.R

Lines 438 to 439 in 1144585

force_push = x$force * 1e-6,
force_pull = x$force_pull * 1e-2,

After some playing, I could not perfectly restore the old behavior in ggrepel 0.9.4. I tried changing 1e-6 to 1e-5 and 1e-4, and I think 1e-4 was pretty good, but still a bit off.

Maybe a better approach is to multiply all coordinates by some scaling factor, and leave these two numbers as-is? I don't know if that will work.

Maybe there are smarter ways of choosing values for force magnitudes? I don't know, but I bet someone has done this in the past.

Approach 2

The second approach is to modify the positioning algorithm so it is more robust. Some time after I figured out how to use a physical repulsion/attraction simulation to position labels, I found a totally different approach in d3 voronoi-labels. I like the Voronoi algorithm, because it is simpler and does not depend on arbitrary numbers for force calculations. I wish I could go back in time and do it that way from the start, but this has been a learning journey for me, so ggrepel has a physical simulation instead.

Some day, I might consider changing to a Voronoi algorithm in a new version of ggrepel. It may be possible to keep the old simulation algorithm around for backwards-compatibility if people like the way it looks. But a newer algorithm using Voronoi calculations will probably be more efficient and robust. Perhaps some physical simulation fine-touches can improve label placement after the initial Voronoi placement, and I bet the arbitrary choices of force magnitudes will be less important when we have nice initial positions.

Also, I realize this PR is supposed to be about segment length, so I understand that the Voronoi idea is a bit much — I just wanted to share the ideas bouncing around in my head. Feel free to ignore! Just a thought.

@teunbrand
Copy link
Author

teunbrand commented Jan 24, 2024

Thanks for the examples Kamil. I agree with you that these look worse than before.

Knowing nothing about the internals of the attraction/repulsion simulation underneath, I think the correct multiplier should be the width or height, as everything is now scales from [0, 1] to [0, width] and [0, height]?

Maybe a better approach is to multiply all coordinates by some scaling factor, and leave these two numbers as-is? I don't know if that will work.

In theory you should be able to divide all x's by width and y's by height and you should have the calculation similar to before when everything was in native units. You'd also have to re-multiply after the repelling algorithm I suppose.

The voronoi algorithm sounds nice, but I think rather than it being the algorithm for label positioning, I think it is a good initialiser (instead of random jitter, I suppose). For example in those big voronoi cells it doesn't make sense to place the label so far away when there is plenty of space near the point itself.

EDIT: Totally read over the remark suggesting something similar 😅

Perhaps some physical simulation fine-touches can improve label placement after the initial Voronoi placement,

@slowkow
Copy link
Owner

slowkow commented Jan 24, 2024

If you want to give the scaling a shot, I'd appreciate it! 😃

Otherwise, I will try to come back to this sometime in the future and see if I can get the scaling close enough so that people don't complain about any changes in performance. I feel like I really messed up by choosing "native" without understanding what it meant at the time. My current understanding is that "native" forces width and height to both be in the range 0-1 instead of allowing width range be different than height range.

Anyway, thank you so much for the discussion and the help! There's always so much to learn.

@teunbrand
Copy link
Author

Thanks for the discussion here Kamil! I'm closing this PR in favour of #265.

@teunbrand teunbrand closed this Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

flexible point.padding
2 participants